Feat: sno leader election config#1242
Feat: sno leader election config#1242openshift-merge-robot merged 2 commits intoopenshift:masterfrom
Conversation
Signed-off-by: ehila <ehila@redhat.com>
|
To give a bit more context. Some of the numbers I was seeing are pasted below along with the queries ran against prometheus. The methodology was to create two 4.9 SNO-DU clusters (leader-election-new and leader-election-oc), both had CVO scaled down, and the only modification was that Things I tried to verify was the downtime support for APIServer in SNO clusters described in this bz, rolling out APIServer didn't cause any restarts on the operators. Similarly rolling out the operator didn't cause any restarts and the operands behaved correctly with out any restarts as well. Queries Ran95th Quantile Over Time (CPU Usage) - API Server
quantile_over_time(
.95,
sum(
rate(container_cpu_usage_seconds_total{
container!="POD",
pod=~"kube-apiserver-leader-election-oc-49|kube-apiserver-leader-election-new-49",
namespace="openshift-kube-apiserver"}[5m]
)
)[6h:]
) * 1000Average Over Time (CPU Usage) - API Server
avg_over_time(
sum(
rate(container_cpu_usage_seconds_total{
container!="POD",
pod=~"kube-apiserver-leader-election-oc-49|kube-apiserver-leader-election-new-49",
namespace="openshift-kube-apiserver"}[5m]
)
)[6h:]
) * 1000
95th Quantile Over Time (CPU Usage) - Operators
quantile_over_time(
.95,
sum(
rate(container_cpu_usage_seconds_total{
container!="POD",
namespace=~"openshift-apiserver-operator|openshift-authentication-operator|openshift-cluster-storage-operator|openshift-console-operator|openshift-controller-manager-operator|openshift-etcd-operator|openshift-kube-apiserver-operator|openshift-kube-controller-manager-operator|openshift-kube-scheduler-operator|openshift-kube-storage-version-migrator-operator|openshift-service-ca-operator"
}[5m]
)
)[6h:]
) * 1000Average Over Time (CPU Usage) - Operators
avg_over_time(
sum(
rate(container_cpu_usage_seconds_total{
container!="POD",
namespace=~"openshift-apiserver-operator|openshift-authentication-operator|openshift-cluster-storage-operator|openshift-console-operator|openshift-controller-manager-operator|openshift-etcd-operator|openshift-kube-apiserver-operator|openshift-kube-controller-manager-operator|openshift-kube-scheduler-operator|openshift-kube-storage-version-migrator-operator|openshift-service-ca-operator"
}[5m]
)
)[6h:]
) * 1000 |
|
Summarizing the overall benefit to an SNO cluster over X timeframe will be helpful for weighing the tradeoff of slightly lower CPU cost at the expense of latency of lease reacquisition. |
|
Sure thing, a quick summary with some new numbers. I ran the clusters under load. I used kube-burner to spin up some traffic against the API Server and just have some workloads in the cluster. The latest numbers show a pretty consistent picture of a small reduction in CPU consumption by the Changed Operators. On average by increasing leader election to 60s I'm seeing a reduction of In short, with the current numbers, we would have a worst non-graceful case of As always with things of this nature, another pair of eyes would be fantastic on the numbers, queries and the methodology. Any input is greatly appreciated. |
| // This method should only be called when running in an SNO Cluster. | ||
| func LeaderElectionSNOConfig(config configv1.LeaderElection) configv1.LeaderElection { | ||
|
|
||
| // We want to make sure we respect a 30s clock skew as well as a 4 retry attempt with out making |
There was a problem hiding this comment.
Can you provide color into the 30s skew and why this duration was chosen? In the case of SNO where we are essentially talking about a single clock on the control plane, it's not directly apparent where the skew would come from in regards to a lease.
There was a problem hiding this comment.
That is a great call out, the idea there was to keep that mostly intact as it's not clear to me if there will be SNO clusters that might have a worker machine and what that arrangement might be in terms of where operators might fall. Figured to err on the side of caution and keep with the set convention.
| } else if infraStatus.ControlPlaneTopology == configv1.SingleReplicaTopologyMode { | ||
| snoLeaderElection := leaderelectionconverter.LeaderElectionSNOConfig(*b.leaderElection) | ||
| b.leaderElection = &snoLeaderElection | ||
| } |
There was a problem hiding this comment.
Shouldn't that block move to WithleaderElection since it's possible that LE is disabled and this code can accidentally turn it on if we're not careful enough.
There was a problem hiding this comment.
That's a good point, the reason why it's there is because we won't know the topology we're in until runtime, the function currently won't alter the disable flag in LE, but to make it more robust I changed the if statement to be if topology = SingleReplica AND LE is not disabled as well as some tests to ensure behavior, do you think that's a good enough safeguard against accidentally turning it on?
There was a problem hiding this comment.
This code is still in a spot that does not have access to the user's intent. @soltysh's comment looks unaddressed to me.
EDIT: I see the value being saved in WithleaderElection
|
/test unit |
1 similar comment
|
/test unit |
| klog.Warningf("unable to get cluster infrastructure status for leader election (falling back to default): %v", err) | ||
| } else if infraStatus.ControlPlaneTopology == configv1.SingleReplicaTopologyMode && !b.leaderElection.Disable { | ||
| snoLeaderElection := leaderelectionconverter.LeaderElectionSNOConfig(*b.leaderElection) | ||
| b.leaderElection = &snoLeaderElection |
There was a problem hiding this comment.
this is stomping the code configured leader election if someone specified their actual values. Only in WithleaderElection do you have sufficient information to know what the user provided so you don't overwrite it.
How you resolve this is up to you/@soltysh, but you cannot unconditionally stomp the user's intent.
There was a problem hiding this comment.
I altered this a bit, I added a flag approach to keep it straight forward and introduce less things. The idea being if the user provides no timing configs in the WithleaderElection then it's safe to assume we can change those configs, so we fetch the infraStatus to see if we should alter the timings for SNO. @soltysh What do you think about this approach?
There was a problem hiding this comment.
How you resolve this is up to you/@soltysh, but you cannot unconditionally stomp the user's intent.
@deads2k that was also my suggestion here: https://github.com/openshift/library-go/pull/1242/files#r754222946
There was a problem hiding this comment.
With the recent change @soltysh , I'm using the 0 values in the timings to ensure that if the user has decided to have some customizations in the timings then we won't alter the timings during runtime. That should make sure we don't stomp on the users intent, and if no values were provided, then we can query the cluster and change the values if SNO is detected. That's going to be triggered here
|
/approve please squash before merge. |
added functions to help default leader election config for sno clusters updated controller builder to use sno leader election when user provdes no options and we are currently running in sno control plane topology Signed-off-by: ehila <ehila@redhat.com> feat: added cluster infra status check for leader election added check for sno topology for leader election config Signed-off-by: ehila <ehila@redhat.com> upkeep: updated warning message for clarity Signed-off-by: ehila <ehila@redhat.com> docs: added calculations and context added calculations for the sno leader election numbers added more comments around method to provide context and reasoning Signed-off-by: ehila <ehila@redhat.com> feat: added LE disable check and tests added check for LE disable flag and SingleReplica before calling SNO leader election method added tests for LeaderElectionSNOConfig and LeaderElectionDefaulting for expected behavior Signed-off-by: ehila <ehila@redhat.com> test: fixed test case for defaults Signed-off-by: ehila <ehila@redhat.com> test: changed unit test to use reflect package changed unit test for equaility check to use reflect.DeepEqual instead of assert.Equal to follow convention of other unit tests Signed-off-by: ehila <ehila@redhat.com> test: fix unit test when running in cluster fixed unit test that reads in namespace from cluster as fallback Signed-off-by: ehila <ehila@redhat.com> fix: added nil check for infraStatus and error out if not populated Signed-off-by: ehila <ehila@redhat.com> fix: added flag to derive user intent on leader election added a flag to signal if the user supplied any leader election config added logic to allow us to override the leader election if the user has provided no configs added unit tests for sno config logic method Signed-off-by: ehila <ehila@redhat.com> feat: removed un-needed method removed GetClusterInfraStatusOrDie method, not needed Signed-off-by: ehila <ehila@redhat.com> refactor: changed variable name and logging message updated flag variable name to be more explicit inverted logic to match variable name updated messaging to match for event recorder and logging statement Signed-off-by: ehila <ehila@redhat.com> feat: changed flag logic to match recommendation Signed-off-by: ehila <ehila@redhat.com>
aa75fd2 to
2612981
Compare
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: deads2k, eggfoobar, soltysh The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Leader election in SNO clusters have limited resources where requests can put un-needed pressure on the
kube-apiserver. This PR increases the retry period and lease duration on leader election and exposes some helper functions to determine topology and SNO leader election defaults, on operator start up.